Random intersection trees

نویسندگان

  • Rajen Dinesh Shah
  • Nicolai Meinshausen
چکیده

Finding interactions between variables in large and high-dimensional data sets is often a serious computational challenge. Most approaches build up interaction sets incrementally, adding variables in a greedy fashion. The drawback is that potentially informative high-order interactions may be overlooked. Here, we propose an alternative approach for classification problems with binary predictor variables, called Random Intersection Trees. It works by starting with a maximal interaction that includes all variables, and then gradually removing variables if they fail to appear in randomly chosen observations of a class of interest. We show that informative interactions are retained with high probability, and the computational complexity of our procedure is of order p, where p is the number of predictor variables. The value of κ can reach values as low as 1 for very sparse data; in many more general settings, it will still beat the exponent s obtained when using a brute force search constrained to order s interactions. In addition, by using some new ideas based on min-wise hash schemes, we are able to further reduce the computational cost. Interactions found by our algorithm can be used for predictive modelling in various forms, but they are also often of interest in their own right as useful characterisations of what distinguishes a certain class from others.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Heuristic Algorithm for Drawing Binary Trees within Arbitrary Polygons Based on Center of Gravity

Graphs have enormous usage in software engineering, network and electrical engineering. In fact graphs drawing is a geometrically representation of information. Among graphs, trees are concentrated because of their ability in hierarchical extension as well as processing VLSI circuit. Many algorithms have been proposed for drawing binary trees within polygons. However these algorithms generate b...

متن کامل

Intersection and mixing times for reversible chains

Suppose X and Y are two independent irreducible Markov chains on n states. We consider the intersection time, which is the first time their trajectories intersect. We show for reversible and lazy chains that the total variation mixing time is always upper bounded by the expected intersection time taken over the worst starting states. For random walks on trees we show the two quantities are equi...

متن کامل

4 A pr 1 99 9 The Upper Critical Dimension of the Abelian Sandpile Model

The existing estimation of the upper critical dimension of the Abelian Sandpile Model is based on a qualitative consideration of avalanches as self-avoiding branching processes. We find an exact representation of an avalanche as a sequence of spanning sub-trees of two-component spanning trees. Using equivalence between chemical paths on the spanning tree and loop-erased random walks, we reduce ...

متن کامل

Random Intersection Trees for finding interactions in large datasets

Finding interactions between variables in large and high-dimensional datasets is often a serious computational challenge. Because of the huge number of possible interactions, most approaches build up interaction sets incrementally, adding variables in a greedy fashion. In order for this to work, higher order interactions must contain informative lower order interactions. Important examples of t...

متن کامل

P´olya Urn Models and Connections to Random Trees: A Review

This paper reviews P´olya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • P´olya-Eggenberger’s urn • Bernard Friedman’s urn • Generalized P´olya urns • Extended urn schemes • Invertible urn schemes ...

متن کامل

Branches in random recursive k-ary trees

In this paper, using generalized {polya} urn models we find the expected value of the size of a branch in recursive $k$-ary trees. We also find the expectation of the number of nodes of a given outdegree in a branch of such trees.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2014